SNA_Learn: Networks in Animal Health

Teaching
Hands-on intro to animal movement network analysis in R.
Author

Sara C. Sequeira

Published

September 20, 2025

1 Introduction and Example Network

Animal movement connects premises (farms, markets, dealers, raisers, slaughter plants). These connections form contact networks that shape pathogen spread, traceability and targeting of interventions (e.g., targeted surveillance, biosecurity implementation and control measures). In this course, we’ll use R to:

  • Build and visualize networks of animal movements.

  • Visualize networks and annotate nodes by type (state, breed, sex, etc).

  • Compute core metrics (in/out-degree, strength, betweenness, components).

  • “Slice” networks over time to see how structure changes.

Core idea: Each shipment is typically a directed edge from origin → destination, with weight = number of animals moved (or shipments). Node attributes capture premises type (e.g., Dairy, Dealer, Market) and region (i.e., state, zip code). Edge attributes could be animal details (e.g., breeds, age, sex moved).

1.1 Load & Import the dataset

To start, let’s import a typical hypothetical dataset of cattle movements in the US and have a look at the variables.

library(tidyverse)

movements <- readr::read_csv("data/SampleDataset_3m.csv", show_col_types = FALSE)

glimpse(movements)
Variable Description Example
date Date of shipment (YYYY-MM-DD) 2025-03-15
origin_id Unique code for the origin premises P01
origin_type Type of origin premises (e.g., Dairy, Dealer, Market) Dairy
destination_id Unique code for the destination premises P10
destination_type Type of destination premises Feedlot
head Number of animals moved in this shipment 25
category Age/production stage of animals (Calf, Feeder, Adult) Calf
sex Sex of animals (M, F, Mixed) M
breed Breed category (Holstein, Angus, etc.) Holstein
purpose Reason for movement (Sale, Raising, Finishing, Slaughter) Raising

1.2 Identify the nodes - “Nodelist”

Nodes are the entities in our network. In this case, each unique premises (farm, dealer, market, etc.), but it could also be a region. We can list all the unique origins and destinations to define the set of nodes.

# Build node list from both origins and destinations
length(unique(c(movements$origin_name, movements$destination_name))) # how many unique premises?

# if we are interested, we can also add the farm type and state as attributes.
nodes <- movements %>%
  select(origin_name, origin_type, origin_state,
         destination_name, destination_type, destination_state) %>%
  pivot_longer(
    everything(),
    names_to = c("role", ".value"),
    names_pattern = "(origin|destination)_(.*)") %>%
  transmute(prem_id = name, type, state) %>%
  distinct()

# Now quick summaries:
nodes %>% count(type, sort = TRUE)
nodes %>% count(state, sort = TRUE)

nodes %>% 
  count(state, type) %>%
  ggplot(aes(x = reorder(state, n), y = n, fill = type)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Premises by State and Type",
    x = "State",
    y = "Number of Premises",
    fill = "Type") +
  scale_y_continuous(breaks = 0:10) +  # <- only integer ticks
  theme_classic()
TipCheck-in questions:
  • How many different premises could you identify?
  • Is there a better represented state?
  • What types of premises did you find?

1.3 Create an “Edgelist” of movements

In this step, we collapse the raw movement records into what we call an edgelist.

Each row represents a directed connection from an origin premises to a destination premises. Along with the connection itself, we summarize two key attributes: shipments = the total number of movements between that pair of premises; animals = the total number of head(s) moved across all those shipments

This edgelist is the backbone of the network: it defines the edges (links) between premises and stores attributes that can later be used to weight connections later on (e.g., thicker arrows for more shipments moved, larger nodes for more animals moved).

edges <- movements %>%
  group_by(origin_name, destination_name) %>%
  summarise(
    shipments = n(),
    animals   = sum(head),
    .groups   = "drop"
  )

head(edges)

1.4 Visualize the network - Igraph Package

Seems like we have all set. What now? The igraph R package makes our life simple.

Igraph is a library and R package for network analysis, providing a pain-free implementation of graph algorithms and fast handling of large graphs that have millions of nodes and edges. By using this package, we create a special object of class “IGRAPH”.

For example:

library(igraph)
g <- graph_from_edgelist(
  matrix(c("A","B","B","C","C","A"), ncol=2, byrow=TRUE),
  directed = TRUE) 
g
IGRAPH e892929 DN-- 3 3 -- 
+ attr: name (v/c)
+ edges from e892929 (vertex names):
[1] A->B B->C C->A
class(g)
[1] "igraph"
is.simple(g)
[1] TRUE
# Let's visualize the network!
plot(g)

# How many nodes/vertices does it have?
V(g)
+ 3/3 vertices, named, from e892929:
[1] A B C
# How many edges?
E(g)
+ 3/3 edges from e892929 (vertex names):
[1] A->B B->C C->A

Igraph assumes, by default, that our network is directed. If we want to explicitly set it as undirected:

g_undirected <- graph_from_edgelist(
  matrix(c("A","B","B","C","C","A"), ncol=2, byrow=TRUE),
  directed = FALSE) 
plot(g_undirected)


Let’s now visualize our animal movement network!

library(igraph)
g <- graph_from_data_frame(
  d = edges, 
  vertices = nodes,   # your node list with attributes
  directed = TRUE)

plot(g,
  layout = layout_with_fr(g),
  vertex.size = 8,
  vertex.label = NA,
  edge.arrow.size = 0.3,
  edge.color = "grey40")

1.5 The 5-Number Summary Stats

Now that we’ve visualized our animal movement network, the next step is to quantify its structure with a five simple statistics:

1. Size

Total number of nodes (premises) and edges (movements) in the network.

  • How many premises and movements do we observe?
# How many nodes/vertices does it have?
V(g)
vcount(g) 

# How many edges?
E(g)
ecount(g)    

2. Density

Proportion of actual edges to the maximum possible edges.

  • How “connected” is our system compared to a fully connected network?
edge_density(g, loops = FALSE)

3. Number of components

The number of subgroups of nodes where everyone is connected directly or indirectly.

  • Are there isolated clusters of premises not linked to the rest?
components(g, mode = "weak")$no  # number of weakly connected components
components(g, mode = "strong")$no  # number of strongly connected components
TipCheck-in question:
  • Why is the number of weakly connected components different than the strongly connected components?
Code
plot(g,
  layout = layout_with_fr(g),
  vertex.size = 8,
  vertex.label = NA,
  vertex.color = components(g, mode = "strong")$membership,  # each component gets its own color
  edge.arrow.size = 0.3,
  edge.color = "grey40"
)

4. Average Path Length and/or Diameter

Diameter is the single longest shortest path between any two nodes (i.e., informs about the extent of a possible disease outbreak), while the Average Path Length is the mean of all shortest paths (i.e., informs the mean distance an outbreak would reach).

  • On average, how many steps separate one premises from another? What about the maximum distance?
mean_distance(g, directed = FALSE)
diameter(g, directed = FALSE)

5. Clustering Coefficient (or Transitivity)

Probability that two connected nodes’ neighbors are also connected. I will be zero if there are no triangles, or 1 if all neighbors of nodes connect to each other.

  • Do we see “triangles” (tight local clusters) in our network?
transitivity(g, type = "global")

1.6 Node Centrality

We can also calculate measures that capture some more general notion of ‘importance’ of a node - typically referred to as node/vertex centrality measures. Some of them include degree (in-degree, out-degree or total degree), weighted degree or strength (in-strength, out-strength or total strength), closeness and betweenness centrality.

1) Degree

What is the degree in this network, as represented in the graph?

    River Beef Ranch       Pioneer Dealer        County Market 
                   8                   23                   25 
     Prairie Feedlot      Blue Beef Ranch         North Dealer 
                  17                    8                   26 
           Oak Dairy         Metro Market         Mesa Feedlot 
                   8                   24                   19 
     Vista Slaughter         South Dealer  Sunrise Calf Raiser 
                  14                   25                   15 
      Central Market          Green Dairy    Harvest Slaughter 
                  26                    8                   14 
       Valley Raiser Hillside Calf Raiser           Lake Dairy 
                  17                   17                    9 
         Maple Dairy     Cedar Beef Ranch 
                   9                    8 
1.1. What if we want to add weights to our degree explicitly?

Instead of simply counting the number of edges (degree), we can account for the magnitude of movements by summing up the edge weights:

# Total animals moved (in + out)
V(g)$strength_animals_total <- strength(g, mode = "all", weights = E(g)$animals)

# Total shipments moved (in + out)
E(g)$weight <- E(g)$shipments

Once we defined our weights, we can visualize it!

# Plot scaled by total animals moved
plot(g,
  layout = layout_with_fr(g),
  vertex.size = scales::rescale(V(g)$strength_animals_total, to = c(2,30)),
  vertex.label = NA,
  edge.width  = scales::rescale(E(g)$weight, to = c(0.01, 4)),
  edge.arrow.size = 0.3,
  edge.color = "grey40",
  main = "Node size scaled by total animals moved (imports + exports)")

It already tells us way more information about the role of premises in the network, right? It might be helpful to distinguish different node types by color (what type of premises have more connections?):

# Define your palette
pal <- c(Dairy = "skyblue2",
    CalfRaiser = "seagreen",
    Dealer     = "purple3",
    Market     = "yellow",
    Feedlot    = "saddlebrown",
    Slaughter  = "tomato3",
    Beef       = "orange")

# Assign colors
types <- as.character(V(g)$type)
cols  <- pal[types]
V(g)$color <- cols

set.seed(42)
{
  plot(g,
    layout = layout_with_fr(g),
    vertex.size  = scales::rescale(V(g)$strength_animals_total, to = c(4,35)),
    vertex.color = V(g)$color,
    vertex.label = NA,
    edge.width   = scales::rescale(E(g)$weight, to = c(0.01, 4)),
    edge.arrow.size = 0.2,
    edge.color  = "grey40",
    main = "Node color defines premise type"
  )
  legend("topright",
         legend = intersect(names(pal), unique(as.character(V(g)$type))),
         col    = pal[intersect(names(pal), unique(as.character(V(g)$type)))],
         pch = 19, bty = "n", cex = 0.9, title = "Premises type")
}

1.2. We can also explore In-Degree** and Out-Degree separately!**
# Total animals moved (in + out)
V(g)$strength_animals_in <- strength(g, mode = "in", weights = E(g)$animals)
V(g)$strength_animals_out <- strength(g, mode = "out", weights = E(g)$animals)
  
in_out_plot <- par(mfrow = c(1,2), mar = c(1,1,1,2)) # frame side by side plots

  plot(g,
    layout = layout_with_fr(g),
    vertex.size  = scales::rescale(V(g)$strength_animals_in, to = c(4,35)),
    vertex.color = V(g)$color,
    vertex.label = NA,
    edge.width   = scales::rescale(E(g)$weight, to = c(0.01, 4)),
    edge.arrow.size = 0.2,
    edge.color  = "grey40",
    main = "Node size based on In-degree")
  legend("topright",
         legend = intersect(names(pal), unique(as.character(V(g)$type))),
         col    = pal[intersect(names(pal), unique(as.character(V(g)$type)))],
         pch = 19, bty = "n", cex = 0.9, title = "Premises type")

  plot(g,
    layout = layout_with_fr(g),
    vertex.size  = scales::rescale(V(g)$strength_animals_out, to = c(4,35)),
    vertex.color = V(g)$color,
    vertex.label = NA,
    edge.width   = scales::rescale(E(g)$weight, to = c(0.01, 4)),
    edge.arrow.size = 0.2,
    edge.color  = "grey40",
    main = "Node size based on Out-degree")
  legend("topright",
         legend = intersect(names(pal), unique(as.character(V(g)$type))),
         col    = pal[intersect(names(pal), unique(as.character(V(g)$type)))],
         pch = 19, bty = "n", cex = 0.9, title = "Premises type")

TipCheck-in questions:
  • Looking at the 2 networks, who are the biggest receivers (highest in-degree nodes) simultaneously the major senders (out-degree nodes) of animals?

    Code
       # check the top in- and out-degree nodes
      V(g)$name[order(V(g)$strength_animals_in,  decreasing = TRUE)[1:3]]
      V(g)$name[order(V(g)$strength_animals_out,  decreasing = TRUE)[1:3]]
  • If they are not the same, what could this mean for disease transmission?

2) Closeness centrality

Closeness centrality is the inverse of the sum of all the distances between node i and all the other nodes in the network.

Nodes are more prominent to the extend they are close to all other nodes in the network (most often ignoring direction of links for animal networks);

*What does that mean practically?* Premises with high closeness centrality could spread disease quickly across the system, even if they don’t handle the largest number of animals. Think of these nodes as shortcuts in the network - surveillance at these nodes might help us detect diseases earlier, since they are more central in terms of reach.

# Compute closeness (ignore direction; unweighted)
V(g)$closeness <- closeness(g, mode = "all", weights = NA, normalized = TRUE)

# Simple plot: size by closeness, color by type
plot(g,
  layout       = layout_with_fr(g),
  vertex.size  = scales::rescale(V(g)$closeness, to = c(6, 34)),
  vertex.color = V(g)$color,
  vertex.label = NA,
  edge.width   = scales::rescale(E(g)$weight %||% 1, to = c(0.2, 4)),
  edge.arrow.size = 0.20,
  edge.color   = "grey65",
  main         = "Closeness centrality (undirected)")

legend("topright", legend = names(pal), col = pal, pch = 19, bty = "n", cex = 0.8)

TipCheck-in questions:
  • Who are the highest closeness centrality nodes?
Code
# top 3 nodes tip
V(g)$name[order(V(g)$closeness, decreasing = TRUE)[1:3]]

2) Betweenness centrality

Betweenness centrality, as the name indicates, measures the extent to which a node lies on paths between other nodes.

Nodes have high betweeness if they are in a position to observe or control the “flow of information” in the network.

*What does that mean practically?* Premises with high betweenness centrality act like bridges or brokers. They don’t necessarily move the most animals, but they sit on the shortest paths between other premises, meaning they can control or facilitate the flow of animals (and pathogens) between otherwise weakly connected parts of the system. Targeting high-betweenness nodes (e.g., through biosecurity campaigns or movement restrictions) can interrupt transmission routes more effectively than focusing on large volume sites.

# Compute betweenness (often undirected for animal networks)
V(g)$betweenness <- betweenness(g, directed = FALSE, weights = NA, normalized = TRUE)

# Visualize
plot(g,
  layout       = layout_with_fr(g),
  vertex.size  = scales::rescale(V(g)$betweenness, to = c(6, 34)),
  vertex.color = V(g)$color,
  vertex.label = NA,
  edge.width   = scales::rescale(E(g)$weight %||% 1, to = c(0.2, 4)),
  edge.arrow.size = 0.20,
  edge.color   = "grey65",
  main         = "Betweenness centrality (undirected)")

legend("topright", legend = names(pal), col = pal, pch = 19, bty = "n", cex = 0.8)

TipCheck-in questions:
  • Who are the highest betweenness centrality nodes?
Code
# top 3 nodes
V(g)$name[order(V(g)$betweenness, decreasing = TRUE)[1:3]]
Back to top